A Statistical Framework for Spatial Comparative Genomics

نویسندگان

  • Rose Hoberman
  • Andrew Moore
  • Russell Schwartz
  • Jeffrey Lawrence
چکیده

Comparison of the spatial organization of related genomes reveals a wealth of information about how complex biological systems evolve and function. A fundamental task in spatial comparative genomics is identification of homologous genomic regions, regions that have descended from a common region in an ancestral genome. While closely related regions are characterized by conserved gene content and order, in more distantly related genomes homologous regions will be apparent only as gene clusters, pairs of regions with similar, but not identical, gene content and scrambled gene order. As gene content and order diverge, statistical tests to reject the null hypothesis that these regions share genes by chance become essential. In this thesis, I provide statistical tests to assess the significance of gene clusters for a variety of biological questions and search scenarios. I present the first formal statistical framework for the max-gap cluster, the most widely used cluster definition in genomic analyses. This framework provides statistical tests for two common search scenarios and facilitates principled selection of parameter values prior to conducting a search for gene clusters. Second, I propose novel statistical tests for clusters spanning three genomic regions, for two comparative genomics applications: analysis of conserved linkage within multiple species and identification of large-scale duplications. Multi-genome clusters are of increasing importance, yet existing tests focus almost exclusively on pairwise comparisons. My results demonstrate that simultaneously considering information from more than two regions dramatically improves sensitivity over pairwise methods. Third, I demonstrate the importance of incorporating cluster statistics in algorithms for spatial comparative genomics. Orthologs, genes that descended from a common ancestor through speciation, are the fundamental unit of comparison in many comparative genomics applications. Using my statistical framework for evaluating max-gap clusters, I develop a new method for ortholog prediction based on conserved spatial organization. Using statistical significance to rank conserved patterns makes it possible to accommodate a variety of spatial features in a single framework, yielding a method that can be applied to a broad range of genomic data sets. This flexible framework outperforms current spatial ortholog prediction methods, especially on highly diverged genomes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VIMSS Computational Core: Comparative Analysis of Regulatory Systems in Environmental Microbes

Background. The VIMSS Computational Core group is tasked with data management, statistical data analysis, modeling, and comparative and evolutionary genomics for the larger VIMSS effort. We have matured many of our analyses and VIMSS data into our flagship comparative functional microbial genomics tool MicrobesOnline (http://microbesonline.org). We have used this framework to interpret the data...

متن کامل

VIMSS Computational Microbiology Core Research on Comparative and Functional Genomics

Background. The VIMSS Computational Core group is tasked with data management, statistical analysis, and comparative and evolutionary genomics for the larger VIMSS effort. In the early years of this project, we focused on genome sequence analysis including development of an operon prediction algorithm which has been validated across a number of phylogenetically diverse species. Recently, the Co...

متن کامل

Determination of Optimal Sampling Design for Spatial Data Analysis

Extended Abstract. Inferences for spatial data are affected substantially by the spatial configuration of the network of sites where measurements are taken. Consider the following standard data-model framework for spatial data. Suppose a continuous, spatially-varying quantity, Z, is to be observed at a predetermined number, n, of points ....[ To Countinue Click here]

متن کامل

Statistical and Combinatorial Aspects of Comparative Genomics*

This document presents a survey of the statistical and combinatorial aspects of four areas of comparative genomics: gene order based measures of evolutionary distances between species, construction of phylogenetic trees, detection of horizontal transfer of genes, and detection of ancient whole genome duplications.

متن کامل

Comparative genomics of human stem cell factor (SCF)

Stem cell factor (SCF) is a critical protein with key roles in the cell such as hematopoiesis, gametogenesis and melanogenesis. In the present study a comparative analysis on nucleotide sequences of SCF was performed in Humanoids using bioinformatics tools including NCBI-BLAST, MEGA6, and JBrowse. Our analysis of nucleotide sequences to find closely evolved organisms with high similarity by NCB...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007